Sentential Representations in Distributional Semantics

نویسنده

  • Nghia The Pham
چکیده

This thesis is about the problem of representing sentential meaning in distributional semantics. Distributional semantics obtains the meanings of words through their usage, based on the hypothesis that words occurring in similar contexts will have similar meanings. In this framework, words are modeled as distributions over contexts and are represented as vectors in high dimensional space. Compositional distributional semantics attempts to extend this approach to higher linguistics structures. Some basic composition models proposed in literature to obtain the meaning of phrases or possibly sentences show promising results in modeling simple phrases. The goal of the thesis is to further extend these composition models to obtain sentence meaning representations. The thesis puts more focus on unsupervised methods which make use of the context of phrases and sentences to optimize the parameters of a model. Three different methods are presented. The first model is the PLF model, a practical composition and linguistically motivated model which is based on the lexical function model introduced by Baroni and Zamparelli (2010) and Coecke et al. (2010). The second model is the Chunk-based Smoothed Tree Kernels (CSTKs) model, extending Smoothed Tree Kernels (Mehdad et al., 2010) by utilizing vector representations of chunks. The final model is the C-PHRASE model, a neural network-based approach, which jointly optimizes the vector representations of words and phrases using a context predicting objective. The thesis makes three principal contributions to the field of compositional distributional semantics. The first is to propose a general framework to estimate the parameters and evaluate the basic composition models. This provides a fair way to comparing the models using a set of phrasal datasets. The second is to extend these basic models to the sentence level, using syntactic information to build up the sentence vectors. The third contribution is to evaluate all the proposed models, showing that they perform on par with or outperform competing models presented in the literature. Thesis Supervisor: Marco Baroni Title: Associate Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compositionality in (high-dimensional) space

Formal semanticists have developed sophisticated compositional theories of sentential meaning, paying a lot of attention to those grammatical words (determiners, logical connectives, etc.) that constitute the functional scaffolding of sentences. Corpus-based computational linguists, on the other hand, have developed powerful distributional methods to induce representations of the meaning of con...

متن کامل

JoBimText Visualizer: A Graph-based Approach to Contextualizing Distributional Similarity

We introduce an interactive visualization component for the JoBimText project. JoBimText is an open source platform for large-scale distributional semantics based on graph representations. First we describe the underlying technology for computing a distributional thesaurus on words using bipartite graphs of words and context features, and contextualizing the list of semantically similar words t...

متن کامل

Distributed representations for compositional semantics

The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. H...

متن کامل

Aligning Packed Dependency Trees: a theory of composition for distributional semantics

We present a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We show that these structures have the potential to capture the full sentential contexts of a lexeme and provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambi...

متن کامل

Towards a semantics for distributional representations

Distributional representations have recently been proposed as a general-purpose representation of natural language meaning, to replace logical form. There is, however, one important difference between logical and distributional representations: Logical languages have a clear semantics, while distributional representations do not. In this paper, we propose a semantics for distributional represen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016